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Tomography in seismology often leads to underdetermined and inconsistent systems of 
linear equations. When solving, care must be taken to keep the propagation of data errors 
under control. 

In this paper I test the applicability of three types of damped least-squares algorithms to the 

kind of sparse matrices encountered in seismic tomography: (1) singular value decomposition 
with Lanczos iteration, (2) conjugate gradient iteration with the LSQR algorithm, and (3) the 
Dines -Lytle method. 

Lanczos iteration may be applied to large sparse systems of low rank to calculate solutions 
by singular value decomposition but becomes impractical with problems of larger size. The 
Paige-Saunders algorithm (LSQR), which incorporates Lanczos' iteration into a conjugate 
gradient method, provides a least-squares solution to the system with acceptable filtermg 
properties. In a synthetic tomographic experiment, it proved to be converging up to an order 
of magnitude faster than the Dines- Lylle algorithm, a stationary iterative process. 

For large tomographic systems, where restrictions in the available computer lime pose 
Umitations on the number of iterations, this indicates that conjugate directions methods are to 
be preferred to the more commonly applied Gauss- Seldel type of algorithms. 

To avoid unwarranted conclusions when the problem is severely underdetermined or under- 
mined with large data errors, resolution analysis must be applied to the data set. An algorithm 
for the determination of the resolution in sparse systems is given, cc i985 Academic Press, inc. 



I. Introduction 

As with other tomographic measurements, seismic data represent an integral 
property of the medium crossed by the ray path. Usually (but not always) the 
seismologist is interested in the seismic velocity within the medium, not in its absor- 
bing properties. However, the mathematical problem is not different from other 
projection problems in electron microscopy or radiology. 

Whether earthquakes or artificial sources are used as radiators of energy^ the pat- 
tern of projection angles is generally inadequate in seismic tomography. The 
resolution provided by the system of crossing rays may differ substantially for dif- 
ferent localities in the subsurface. The algebraic system is almost always ill-con- 
ditioned. Moreover, data errors can be large. 
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In this paper I shall briefly compare several iterative matrix techniques for the 
calculation of the damped least-squares solution (DLS) in a synthetic tomographic 
experiment, and develop an efficient method to calculate the resolution of a given 
data set. 

The three DLS methods explored in this paper are representative for three broad 
classes of inversion algorithms: 

(1) Singular value decomposition: the Paige -Lanczos algorithm; 

(2) Conjugate directions methods: the Paige Saunders algorithm; 

(3) Stationary iterative processes: the Dines- Lytic algorithm. 

Although seismology will act as leitmotiv throughout this paper, the conclusions 
and the methods to calculate the resolution are useful for any tomographic system 
with imperfect illumination. 



2. Seismic Tomography 

Tomographic methods have first been used in global seismology by Aki ct. al. 
[1] and the number of applications in this field is growing fast (see Whilcombe 
[17] for further references). Some recent applications in exploration seismics are 
listed in McMechan [9]. 

In general wc arc interested in the seismic velocity i"(r) at location r within a 
region R, We shall denote the inverse of the velocity (the slowness) by .v(r)= l/t^(r). 
The problem may be formulated in any number of dimensions (/)) but usually 
£) = 3. Sometimes shooting is done along one line only, so that the subsurface must 
be assumed homogeneous in one of the spatial coordinates and Z) = 2. On the other 
hand, time may be an independent parameter as well, e.g., when we monitor an oil 
reservoir or the underground gasification of a coal seam. In that case the theory in 
this paper can easily be adapted to include the time, if we scale time by some 
velocity v^, appropriate for the velocity with which the model geometry is changing. 
In that case we set D-A and = Tq/. 

Because perturbations in the velocity field cause ray bending, the relationship 
between the travel times of seismic rays and .v(r) is nonlinear. We linearize the 
problem to first order by specifying a starting model .yo(r). Deviations .y(r) to the 
starting model are related in an approximate way to deviations tj from the 
predicted travel time for the /th ray: 

/>[ s{r)d''T (y-l,...,A^) (1) 

where Sjr\ R is the intersection of the y th ray through /?. In classical ray theory this 
intersection is assumed to be infinitely narrow, reducing (I) to a line integral. Fer- 
mat's Principle may be invoked to argue that the value of the integrand in ( 1 ) is 
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insensitive to small perturbations in 5y. We can therefore safely calculate Sj for the 
ray in the (known starting model. This is instrumental in the linearization of the 
problem. 

In order to solve (1) we discrelizc the space R. We do this by defining a finite 
number of basis functions hf,{r) that span a subspacc of the Hilbcrt space of all 
slowness models {.v(r)}: 

4r)= thhdr)- (21 

k. i 

Without loss of generality we may assume that the basis functions are orthonor- 
mal [10]: 

In general we shall subdivide R into a number of nonovcrlapping cells, so that we 
can satisfy (3) by a simple scaling of the basis functions. In the following we adopt 

/j -(r) = V. within cell i 

— 0 outside 

where 

•'cell i 

is the cell's ''volume." With this (2) reduces to a sparse system of linear equations 

(5) 

k 

where 

[ K{x)d'\. (6) 

It will be convenient to scale the data to unit variance. If we denote the covariance 
matrix of the data by C^, this can be achieved by setting t = ^ -t\ L = * -L', so 
that 

Ls=t. (7) 

L' is a sparse matrix. This property will be destroyed in L by the scaling to unit 
variance, except if Cj '-^ is a narrow-band matrix. Usually wc lack information on 
the covariance of the data errors, and set Cy- = diag[ci, <::2v", ^\v]. with the 
estimated standard deviation in delay time i,. 
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Equation (7) is the equation we wish to solve in seismic tomography* In most 
recent applications the number of cells has not been larger than about 200. Advan- 
ces in instrumentation and data transmission techniques may soon increase this 
figure by several orders of magnitude. A typical apphcation in the near future may 
involve the inversion of, say, 200,000 arrival times for a model with 
50 X 50 X 20 = 50,000 cells, or a matrix with 10^^ elements, of which about 10^ are 
nonzero, Clayton [4] reported preliminary results of a tomographic interpretation 
of L7 X 10^ data. This clearly asks for algorithms that avoid exphcit storage of the 
elements of L. Such algorithms are invariably iterative. It is well known that their 
rate of convergence depends strongly on the type of matrix involved. Note that for 
very large systems it may be prohibitively expensive to do more than one iteration 
(at least with today's computers), even though such simple inversion ("backprojec- 
tion") has serious deficiencies in the case of inadequate illumination. This paper 
assumes that it is feasible to do at least a few iterations. 

The matrix L has an effective rank K which is in general much less than 
Min(M, A^), where M is the number of ceils ("unknowns"). Because of data errors, 
nonhnear effects, or approximations in the calculation of the ray path geometry 
[15], the equations in (7) are not only underdetermined, but inconsistent as well. A 
Gauss transform reduces (7) to a consistent system of M equations 



which leads to the least-squares solution of (7). If the elements of L were 
uncorrected and randomly distributed around O, the left Gauss transformation 
would also have the effect of making L^L a matrix with a dominant diagonal. With 
tomography, however, all elements of L are nonnegative, and preferent ray 
geometries may introduce strong correlations between its columns. For this reason 
we do not expect that iterative methods of the Gauss-Seidel type will operate very 
effectively on this system. 

In the following we shall study iterative solutions to (8)- Although we shall often 
use the matrix notation, it is important to realize that none of the techniques 
described requires the matrix L or its transpose to be stored in memory. A product 
like Ls can be calculated row by row. That is, we first calculate for the first ray 
path, then etc. A product like L^t = b can be handled in the same way, although 
then bj is not calculated in the yth step, but slowly growing until all rays passing ceU 
j have been treated. Consequently, we never calculate L^L explicitly. This avoids 
the problem that, although L is a sparse matrix with about NM^^^ nonzero 
elements, L^L has a good chance of being a dense matrix [3]. 

Parts of the model may have been illuminated by only a few rays, or some cells 
may not have been visited by any ray paths at all. We shall wish to handle indeter- 
minacies in the model by minimizing or restraining 



(8) 




(9) 
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This solution can be found by means of the singular value decomposition (SVD) 
of the matrix L. The SVD solution is given by 

<s>=VE-2V^L^t (JOa) 

where is a A^x f^; diagonal matrix of the largest eigenvalues e^'- e^, of L^L, V the 
M xK matrix of the corresponding eigenvectors. Within the AT-dimensional sub- 
space spanned by the columns of V, <s) is the unique solution to (8) with the 
smallest Euclidean norm. It is related to all other solutions of (8), hence to the true 
solution as well, through the projection matrix VV^ : 

<s>:=VV^s. (11) 

Wiggins [18] proposes to adopt VV^ as the "resolution matrix." The idea is that 
VV^ acts as a blurring window, through which we can view the true Earth. 
However, we should take extreme care with this: the row elements of the resolution 
matrix do not sum to 1, and (11) does not represent a true physical averaging. In 
Section 5 of this paper I shall give a more satisfactory approach to resolving power 
estimation in discretized systems. We note that 

Var{<.v,>}= X (12) 

Since errors are inversely proportional to the magnitude of the eigenvalues, it is 
advantageous to filter them out in some way. The DLS solution can be constructed 
by taking, instead of (10), 

<s > = V(E' + ) - ' Wt. (10b) 

SVD has so far been the most widely used inversion technique in global seismic 
tomography [1, 14 J. 



3. Damphd Least-Sqlarhs Methods 

It is possible to decompose the matrix L in /c-hl steps into a (/c + l)xA: 
bidiagonal matrix B^^ and matrices Ujt and V^, whose columns would be 
orthogonal in the absence of round-off errors [6, 16, 11]: 

(13) 

where r^- is a vector of residuals that (at least in theory) goes to zero in M or less 
iterations in the absence of round-off errors. The recursion involves only mul- 
tiplications with L and L^. Its storage requirements are modest. Only the last 
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columns of U and V are needed in memory. Together with the 2k elements of 
the algorithm requires a minimum storage of N+M + 2k, apart from possible 
storage of elements of L. 

This algorithm can be exploited in several ways. First of all, the largest eigen- 
values of T = B converge to those of L ^L, and can be calculated conveniently 
with the use of Sturm sequences [6, 13]. The effect of round-off errors is to 
introduce artificial multiplicities of single eigenvalues. These artificial eigenvalues 
can easily be identified by monitoring the convergence as k increases [8]. 

For SVD, the eigenvectors must be computed as well Unfortunately this involves 
transformation matrices of order M x k. Thus, use of the algorithm in this way is 
expected to be limited to systems of low rank, which can be described with not too 
many eigenvectors. Nevertheless it may be used to apply SVD to tomographic 
systems of somewhat larger size than currently in vogue, retaining all the advan- 
tages ofsva 

Alternatively, Paige and Saunders [II] have employed (13) for a stable, recur- 
sive DLS algorithm. In their method, the kth approximation to the solution s is 
defined to be s^. — V/^y^, where yi, is the DLS solution to the {k+ \ )xk system 
B^. y^t = e,/lt|. The method is closely related to the conjugate gradient method, in 
which the residual vectors of successive iterations are mutually orthogonal, and the 
corrections applied to s^r. in each iteration are mutually conjugate with respect to 
the matrix L. 

Because of the orthogonalization involved, the algorithm should theoretically 
converge in N steps or less. Round-off errors do make trouble, however. 
Nevertheless one may expect that the convergence rate of such methods is faster 
than that of simpler iterative methods. As an example of such methods we chose the 
algorithm proposed by Dines and Lytle [5]. It is a stationary iterative process, i.e., 
the correction to the approximate solution s*^^ after k steps is chosen proportional 
in magnitude to the residual vector r^*^ ~ t — Ls*^^: 

jj(/:)^s(^-i) + Hr^A'-^^ (14) 

Since the residual r^^Ms recalculated after each step, round-off errors are usually not 
important in stationary iterative processes. In the Dines- Lytle method one chooses 

H-SL'D (15) 

where D = diag(D,v) with D„ = {J^fi , L?} ^ and S = diag(w,), being the number 
of nonzero elements in the ith column of L. 

The reason for choosing this algorithm is that its convergence to the minimum 
norm least-squares solution is assured [7]. When stopped with a hmited number of 
iterations completed, the contributions of small eigenvalues are damped out with a 
filter only slightly different from that in (10b). This sets the Dines Lytle algorithm 
apart from other popular row-action methods such as the algebraic reconstruction 
technique, which do converge to a minimum norm, but not necessarily to the least- 
squares solution. 
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4. Numerical Expfriments 

The Perfect Data Case 

In order to test the numerical performance of the algorithms, I devised a simple 
experiment. The model consists of a rectangle of 20 x 20 ^ 200 cells. Although the 
physical dimensions are irrelevant to the results, this actually models part of the 
upper mantle (width 600 km, depth 300 km). As in global seismology, the model is 
illuminated from below by seismic rays from a source located far away (Fig. 1) 
Each of the 20 surface cells is covered by a seismograph. 

The velocities in the model differ by up to 1% from a standard model. Velocity 
deviations decrease gradually from the upper left to the lower right corner of the 
model. There is, however, a sharp discontinuity with a sudden increase in velocity, 
tilted at about 45 degrees, and a sharply defined low-velocity body 5 cells wide and 
2 cells deep. Although by itself not realistic, the model combines features commonly 
encountered in the Earth, such as a high-velocity slab, a gradual velocity change in 
lateral direction, and a (magma) body of low velocity. 

We shall study two different types of illumination. In the '"wide-angle" case. 20 
rays arrive in a regular fan with angles between - 35 and +5 degrees to the ver- 
tical. In the '"narrow-angle" case, the 20 rays arrive in a narrow fan between 5 
and - 1 degrees to the vertical. Thus, there are 400 data points, 200 unknows. 




I 1 . \ , i J 

C-Oe .O.C:j 0 i.-n/scc 

Fig. 1, The test model used in the calculations. The darkness in each of the 10 x 20 cells indicates the 
drop in velocity with respect to a homogeneous model. The colour scale is indicated below. 
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The eigenvalues of the least-squares system (8) have been calculated with the 
Lanczos algorithm, and one may study the condition of the system by looking at 
the eigenvalues of L^L (Fig. 2).- For the narrow-angle illumination, there are essen- 
tialy 20 large eigenvalues, after which the magnitude of the eigenvalues decreases 
rapidly to zero. Obviously, the average velocity in each of the 20 columns in Fig. 1 
is well-resolved, but any detail in the vertical direction can only be obtained with a 
large standard deviation in the solution. The decrease in the magnitude of the 
eigenvalues for the wide-angle case is much more gradual. The eigenvalues still have 
a considerable magnitude when eigenvalues of the narrow-angle case have all but 
disappeared. This reflects the larger variation in illumination angles as well as the 
improved vertical resolution. 

For both cases the eigenvalues were calculated with 200 iterations. In the wide- 
angle case 54 eigenvalues converged, and in the narrow-angle case only 48. It is 
evident that many iterations are necessary, more than the number of eigenvalues 
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Fig. 2. The eigenvalue patterns for wide-angle (O) and narrow-angle (□) illumination, using 
Paige-Lanczos algorithm. Eigenvalues are in order of decreasing magnitude. 
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obtained. This is especially so when small eigenvalues, usually much closer lo each 
other, are desired. 

Because the usefulness of SVD in large systems relies heavily on the possibility of 
truncating the system at a very low value of the effective rank, 1 have investigated 
how well the system of equations (7) can be fitted with only /<r eigenvectors, K being 
small with respect to the number of cells. To quantify the fit we introduce the nor- 
malized misfit /? of a solution s to (7) as 

For both the wide-angle and the narrow-angle illumination the resuhs are shown 
in Fig. 3 for I <K<40. The solutions for ^ = 40 are shown in Fig. 4. When com- 
pared to Fig. 1, we see that the resolution for the narrow-angle case is quite dis- 
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Fig. 3. Misfit of the solutions obtained with singular value decomposition, using only a few eigen 
vectors belonging lo the largest eigenvalues and using Paige-Lanczos algorithm; wide-angle (O) anc 
narrow-angle ( A ) illumination. 
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Fig. 4. SVD solution for 40 eigenvectors, (a) Narrow-angle illumination, normalized misiBt 0.030, 
and (b) wide-angle illumination, normalized misfit 0.095. 



appointing, as was expected from the eigenvalues. For the wide-angle illumination 
we see that the general trend is recovered with 40 eigenvectors, but the sharp trans- 
ition to higher velocities is quite blurred and the magma body is not recognizable. 
In the following tests we shall concentrate on the wide-angle illumination. 

With less computing effort the Paige-Saunders method performs far better 
(Fig. 5a). In only 20 iterations the misfit R is down to 0.007, as compared to 0.095 
for the SVD method. Most of the results in this paper were obtained with an early 
coded version of the algorithm, which did not incorporate damping. A later version, 
pubhshed as algorithm LSQR [12], does include damping and is optimized with 
respect to memory usage. Whenever results on computational speed are reported in 
this paper, it is for the undamped version of the Paige-Saunders algorithm. The dif- 
ferences with the damped version are not important, however. 

The computational effort to do one iteration is about equal for the two 
algorithms, but the SVD requires eigenvector determination as well It took 400 
iterations to solve (8) to close to machine precision (11 digits) as shown in Fig. 5b. 
For the narrow-angle case a comparable fit can easily be obtained, but the resulting 
model does not show the sharp contrasts of the original in Fig. L This again shows 
that the narrow-angle case is truly singular at the level of precision used. 
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Fig. 5. Paige Saunders solution for wide-angle illumination; (a) undamped solution after 20 
iterations, misfit 0.007, and (b) after convergence to almost machine precision in 400 iterations. 



Finally, we used the Dines-Lytle procedure to solve for the same data set 
(Fig. 6). The models obtained after 20 and 400 iterations look quite reasonable. The 
sharp contrast at the high-velocity slab is resolved after 400 iterations, but not the 
magma body, although its left boundary is just becoming visible. We know that 
Dines-Lytle should eventually converge to the right solution, but apparently there 
is still a lot of information contained in the small eigenvalues. 

In terms of computation time, the Paige Saunders algorithm was found to be 
superior to Dines-Lytle when we consider the misfit R (Fig. 7). An initial lead of 
the Dines-Lytle algorithm, originating from the choice of the starting vector, is 
soon regained. At a misfit level of 5%, Paige -Saunders is about 5 times faster in the 
wide-angle case, and at the 1% level it is 10 times faster in both experiments. 

4b The Noisy Data Case 

The same algorithms have been compared for their filtering properties on noise 
affecting the data. To this end I added randomly oriented vectors to the data vector 
t for the case of wise-angle illumination. The length of the random vector was fixed 
at 50% of |t| to simulate a set of data that is badly degraded by observation errors. 
The results arc shown in Fig. 8. 
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Fig. 6. Dines-Lytle solutions for wide-angle illumination (a) after 20 iterations, misfit 0.031, and (b) 
after 400 iterations, misfit 0.005. 



The SVD solution (8a) is quite successful in filtering out the velocity trend, 
although the velocity contrast at the fault is smoothed and the low velocities of the 
magma body are smeared out into the dominant ray direction. The undamped 
Paige-Saunders, (8b) stopped after five iterations, gives a much more erratic 
solution, as does the Dines-Lytle after 20 iterations (8c). 

The introduction of damping into the Paige-Saunders algorithm did improve the 
resolution of the velocity trend (Fig. 9). The introduction of damping into the 
mathematical solution is equivalent to introducing hypothetical ray segments in 
each cell separately, with zero delay time, after the system has been scaled to unit 
variance in the data. With some experimentation we found that a reasonable 
solution was obtained for a damping equivalent to a ray length of 10 km in each 
cell before scaling, i.e., about 1.6% of the average ray path length in each cell. 

In a separate test, run with a 5% random error vector, all three methods con- 
tinued to give acceptable solutions. In this case the undamped Paige-Saunders 
algorithm started to give erratic solutions when more than about 15 iterations were 
calculated. 

For large errors, the damped Paige-Saunders algorithm is to be preferred. It is 
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Fig. 7. Decrease of the normalized length of the residual vector as a function of central processor 
time on a CDC Cyber 175. 
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FiG. 9. Damped Paige-Saunders solution in the 50% noise case, misfit 0.337. 
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superior in convergence speed to Dines -Lytle, and requires much less back-up 
storage than SVD, since no eigenvectors are needed. However, it is evident from 
Figs. 8 and 9 that many unreal phenomena may turn up in the solution. A separate 
analysis is needed to establish what resolution is warranted by the data and what is 
not. This problem will be attacked in the next section. 



5. Resolution Analysis 

For the description of the resolution we use the concept of an averaging kernel, 
originally formulated by Backus and Gilbert [2] for continuous models. Their 
analysis needs to be modified for discrete, parameterized models. 

The basic idea in Backus-Gilbert resolution theory is to estimate a local average 
of the model around to. This average <5(ro)> is a linear function of the data 

<^-(ro) > = Z />/(ro) tj = Z ^(ro) L^kSk • (17) 

Our task is to determine bj{\^Y A local average of the real Earth can be constructed 
with a peak-shaped function A{r, XqY 

<5(ro)>=| A{x.x^)s{x)d^\ (18) 

where 

I ^(r,ro)^/''r-l. (19) 

"R 

Ideally, the amplitude of ^(r, x^) is negligible outside a small region round r^. As a 
measure of the width of A around fo we use 

M^(r„) = c/, I ^(nro)'lr-ror^V^r (20) 

^ R 

where the factor ir — tol''^'*^ ' ensures the correct dimensionality of W, and c^-^ is a 
scaling factor gauged with help of a simple form of A. For example, if A is 0 out- 
side, and constant inside a region with radius p/2, Cp ensures that W{Xq)-p for 
Ci = 12, C2 = Stc, C3 = 567r/9, respectively. If we develop 

^(r,ro)=X c2Aro)/i,(r) (21) 
( - 1 

we find from (18) 

<i-(ro)>= I a,(x^)s, (22) 
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SO that we identify from (17) 



Z ^jk^Mo)' (23) 



The relationship between W{To) and ^;(ro) is found by substitution of (4), (21), and 
(23) into (20): 

mro)=I.LhlkAira)bM (24) 

where 

f.-c^l \r-T,\'''^h,{vyd''r^dj (25) 

with = |r, — TqI'^ ^ ^ 6'£^ and the center of gravity of cell /. The error in the 
average <.?(ro)> is also a quadratic function of the /^/fq): 

6'Vo)-VarO(ro)> 



X 



= Z AAro)/^,(ro)Cov(/,,/.)= ^ ^(^0)'- (26) 

We are now ready to determine the factors ^y(ro). Facing a choice between minimiz- 
ing lV{r^y) or c^(ro) we introduce a trade-off parameter (0<>i^< x) and minimize 

y(vv,ro)= W^(ro) + HV(ro) (27) 
under the constraint (19), or 

Zi;PM,.(ro)=l. (28) 

If we use the approximation for / in (24) and set 

Po = d^L, 0-ZMP (29) 

i 

then we minimize 

J=h^lPP'^ + wH]h with c''b=l. (30) 

Since P is a sparse matrix, the solution of this system poses no extra storage 
problems. Equation (30) could be solved by the introduction of a Lagrange mul- 
tiplier, but in our experience the system is in this case very ill-conditioned and the 
LSQR algorithm failed to converge in a reasonable number of iterations. Saunders 
(personal communication, 1983) recommends expressing (or any other element 
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of b) first in terms of the others, and substitute in J before minimizing. This has the 
added convenience of formulating the system directly as a DLS problem, for sub- 
sequent solving with the Paige-Saunders algorithm. We truncate b and c to vectors 
p and q of length jV- 1, 

so that 

b = Bp + Ci *e, 
where e, is the unit vector [1, 0,..., 0] and 

-[.:•:] 

if we write 




Equation (30) becomes equivalent to 

Min|p'^B^(A''A) Bp + 2p'B^(A'* A)c, ^e,, 

or 

BVABp + B'^A' Aci ^ei=0. 
These are the normal equations belonging to another minimization problem 

Min lABp + cf ^AeJ. 

Substituting for A and B we find that this is essentially a DLS problem with damp- 
ing parameter vv: 




The system (31) has been solved without difficulty, vv governs the trade-off between 
kernel width and the variance of the error. 

An example of an averaging kernel, computed for the resolution of the center 
clement in the model, is shown in Fig. 10. The error vector here is 5%, This 
corresponds to a standard deviation in measured seismic delay times of about 0,1 s. 
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Fig. 10. Shape of the averaging kernel A{r^ Fq) where Tq is at the center of the rectangular model for 
the case of 5 % noise, wide-angle illumination. The standard deviation in the average is 0*001 km/5. 



which is good but not unrealistic. The elongated shape of the kernel, with a length 
comparable to the height of the model, but a width of only 3 or 4 cells, reflects the 
preferred orientation of the rays crossing the model. 

The trade-off curve in Fig. 1 1 shows how the width is a function of the standard 
deviation in the local average. There is apparently little choice: the preferred ray 
orientation leads to very precise averages (0.001 km/s), with large location uncer- 
tainty (220 km). Efforts to exchange the uncertainty are futile; at best the averaging 
length goes down to 200 km. In fact, the simple shape of the trade-off curve 
suggests that one may obtain a reasonably good approximation to it by performing 
only two calculations, one on each end of the curve. 
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Fig. 11. Trade-off between averaging length W and the standard deviation in the average for the 
center element of the model, for the case of 5 % noise, wide-angle illumination. 
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II should be noted thai we have assumed our data to be unbiased, i.e., the mean 
error to be 0. If this is not the case, the solution will be biased as well, and the 
calculation of the error with (26) will be optimistic. Moreover, if our assumption 
that the data are uncorrelated does not hold, the resolution kernels themselves will 
be in error. This situation is difficult to handle. Even if wc have an estimate of the 
data correlations, scaling the system to unit variance will destroy the sparsity of L 
in (7) and greatly increase the computational effort needed. If is not diagonal, 
the best strategy is probably to accumulate groups of strongly correlated data into 
one equation by summing. 



6. Conclusions 

In test calculations, the Paige -Saunders LSQR algorithm proved to be superior 
to the Lanczos SVD and Dines Lytle methods for the calculations of DLS models 
in seismic tomography. A feasible method to calculate the resolution, using the 
LSQR algorithm, was developed. Although the computation time for the resolution 
of one spot in the model is comparable to that needed to calculate the solution 
itself, resolution calculations arc badly needed to screen out the effects of obser- 
vation errors. 

Test calculations for the resolution in a realistic tomographic experiment show 
that it will be very difficult to obtain sufficient resolution in the upper mantle, 
where rays from teleseismic events arrive in narrow bundles, not far deflected from 
the vertical. An exception to this is in regions of deep seismicity. 
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